Analysis on Gun Violence in the United States of America

1. Introduction

Recently the US congress passed a gun control bill- the most significant firearms legislation in nearly 30 years. The bill imposes tougher checks on the purchase of non-militia firearms. The reforms in the bill includes:

  • Tougher background checks for buyers younger than 21
  • $15bn in federal funding for mental health programs and school security upgrades
  • Funding to encourage states to implement "red flag" laws to remove firearms from people considered to be a threat
  • Closing the so-called "boyfriend loophole" by blocking gun sales to those convicted of abusing their intimate partners

The passage of the new bill came after mass shootings at a supermarket in Buffalo, New York and a primary school shooting in Ulvade, Texas. In October 2017, the Las Vegas shooting claimed 58 lives and left over 500 people injured. In June 2016, there was a shooting at a nightclub in Orlando that claimed 48 lives and left 58 people wounded. These are some of the incidences of mass shooting in the USA. I am interested in digging deeper in to the issue of gun violence in the USA to uncover some potential truths, patterns and trends.

The importance of researching gun violence is it can help answer questions such as:

  • What are some of the different trends assosciated with gun violence over time?
  • Could we explain a geographical pattern of gun violence in the US and the influence of gun control regulations in those locations?
  • What age groups or gender has a higher probability of resorting to gun violence?
  • Does the recently passed gun-control bill fully address the issues reflected in the data?

2. About the data

The CSV file contains data for all recorded gun violence incidents in the US between January 2013 and March 2018, inclusive. The data was downloaded from Gun Violence Archive. The dataset consists of 239677 observations on 29 variables.

In [2]:
import pandas as pd 
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [3]:
df = pd.read_csv("C:/Users/mfaro/Downloads/DATA_01-2013_03-2018.tar/stage3.csv")
df.shape
Out[3]:
(239677, 29)
In [4]:
df.head(2)
Out[4]:
incident_id date state city_or_county address n_killed n_injured incident_url source_url incident_url_fields_missing ... participant_age participant_age_group participant_gender participant_name participant_relationship participant_status participant_type sources state_house_district state_senate_district
0 461105 2013-01-01 Pennsylvania Mckeesport 1506 Versailles Avenue and Coursin Street 0 4 http://www.gunviolencearchive.org/incident/461105 http://www.post-gazette.com/local/south/2013/0... False ... 0::20 0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A... 0::Male||1::Male||3::Male||4::Female 0::Julian Sims NaN 0::Arrested||1::Injured||2::Injured||3::Injure... 0::Victim||1::Victim||2::Victim||3::Victim||4:... http://pittsburgh.cbslocal.com/2013/01/01/4-pe... NaN NaN
1 460726 2013-01-01 California Hawthorne 13500 block of Cerise Avenue 1 3 http://www.gunviolencearchive.org/incident/460726 http://www.dailybulletin.com/article/zz/201301... False ... 0::20 0::Adult 18+||1::Adult 18+||2::Adult 18+||3::A... 0::Male 0::Bernard Gillis NaN 0::Killed||1::Injured||2::Injured||3::Injured 0::Victim||1::Victim||2::Victim||3::Victim||4:... http://losangeles.cbslocal.com/2013/01/01/man-... 62.0 35.0

2 rows × 29 columns

In [5]:
df.tail(2)
Out[5]:
incident_id date state city_or_county address n_killed n_injured incident_url source_url incident_url_fields_missing ... participant_age participant_age_group participant_gender participant_name participant_relationship participant_status participant_type sources state_house_district state_senate_district
239675 1082514 2018-03-31 Texas Houston 12630 Ashford Point Dr 1 0 http://www.gunviolencearchive.org/incident/108... https://www.chron.com/news/houston-texas/houst... False ... 0::42 0::Adult 18+ 0::Male 0::Leroy Ellis NaN 0::Killed 0::Victim http://www.khou.com/article/news/hpd-investiga... 149.0 17.0
239676 1081940 2018-03-31 Maine Norridgewock 434 Skowhegan Rd 2 0 http://www.gunviolencearchive.org/incident/108... https://www.centralmaine.com/2018/03/31/police... False ... 0::58||1::62 0::Adult 18+||1::Adult 18+ 0::Female||1::Male 0::Marie Lancaster Hale||1::William Hale 1::Significant others - current or former 0::Killed||1::Killed 0::Victim||1::Subject-Suspect https://www.centralmaine.com/2018/03/31/police... 111.0 3.0

2 rows × 29 columns

  • The dataframe below shows the percentage of missing values in each column
In [6]:
# Calculating the percentage of NA values in each column
Attributes = []
Missing_value_percentage=[]
for i in df.columns:
    Attributes.append(i)
    Missing_value_percentage.append(round((df[i].isna().sum()/len(df))*100,1))
    data = {"Attributes": Attributes,
           "Percentage_of_missing_values": Missing_value_percentage}
    data1 = pd.DataFrame(data)
data1
Out[6]:
Attributes Percentage_of_missing_values
0 incident_id 0.0
1 date 0.0
2 state 0.0
3 city_or_county 0.0
4 address 6.9
5 n_killed 0.0
6 n_injured 0.0
7 incident_url 0.0
8 source_url 0.2
9 incident_url_fields_missing 0.0
10 congressional_district 5.0
11 gun_stolen 41.5
12 gun_type 41.5
13 incident_characteristics 0.1
14 latitude 3.3
15 location_description 82.4
16 longitude 3.3
17 n_guns_involved 41.5
18 notes 33.8
19 participant_age 38.5
20 participant_age_group 17.6
21 participant_gender 15.2
22 participant_name 51.0
23 participant_relationship 93.4
24 participant_status 11.5
25 participant_type 10.4
26 sources 0.3
27 state_house_district 16.2
28 state_senate_district 13.5
  • Irrelevant columns and columns which have more than 50% missing values were removed to form a new dataset, df2.
In [7]:
df2 = df.drop(['location_description','participant_relationship','address','incident_url','source_url','incident_url_fields_missing','n_guns_involved',
              'incident_characteristics','notes','participant_age_group','participant_name','sources','state_senate_district','state_house_district','congressional_district'],axis=1)
print(df2.shape)
df2.sample(2)
(239677, 14)
Out[7]:
incident_id date state city_or_county n_killed n_injured gun_stolen gun_type latitude longitude participant_age participant_gender participant_status participant_type
31241 192007 2014-08-12 Tennessee Nashville 0 0 NaN NaN 36.1334 -86.6652 NaN NaN NaN NaN
207342 932624 2017-09-06 Louisiana Pineville 0 0 0::Unknown 0::Handgun 31.2491 -92.2788 0::44 0::Male 0::Arrested 0::Subject-Suspect
  • After removing the irrelevant columns and columns with more than 50% data, the dataset now contains 239677 observations on 15 attributes

3.Exploratory Data Analysis

3.1 Univariate Analysis

1. Date

In [8]:
df2.date.dtype
Out[8]:
dtype('O')
In [9]:
df2.date.isna().sum()
Out[9]:
0
In [10]:
df2.date.min(),df.date.max()
Out[10]:
('2013-01-01', '2018-03-31')

The column date is of object data type, there are no missing rows of data, and the data was recorded from 1 January 2013 to 31 March 2018.

2. State

In [11]:
df2.state.dtype
Out[11]:
dtype('O')
In [12]:
df2.state.isna().sum()
Out[12]:
0
In [13]:
df2.groupby('state')['state'].agg('count')
matplotlib.rcParams['figure.figsize'] = (8,15)
sns.countplot(data=df2,y='state',palette='crest')
Out[13]:
<AxesSubplot:xlabel='count', ylabel='state'>

The column state if of object data type. There are no missing values. From the countplot above, top 5 states with the most incidents reports are:

  1. Illnois
  2. Pennslyvannia
  3. Florida
  4. Texas
  5. Ohio

3. City or county

In [14]:
df2['city_or_county'].isna().sum()
Out[14]:
0
In [15]:
len(df2.groupby('city_or_county')['city_or_county'].agg('count'))
Out[15]:
12898

The dataset contains information from 12898 cities or counties inthe United States. This attribute has no missing values. Since there are 12898 cities or counties included in the dataset, an analysis of the cities or counties with the most and least incidents of gun related violence is provided later.

4. Number of people killed

In [16]:
df2['n_killed'].isna().sum()
Out[16]:
0
In [17]:
df2['n_killed'] = df2['n_killed'].astype(int)
df2['n_killed'].dtype
Out[17]:
dtype('int32')
In [18]:
df['n_killed'].describe()
Out[18]:
count    239677.000000
mean          0.252290
std           0.521779
min           0.000000
25%           0.000000
50%           0.000000
75%           0.000000
max          50.000000
Name: n_killed, dtype: float64
In [19]:
df2['n_killed'].value_counts()
Out[19]:
0     185835
1      48436
2       4604
3        595
4        139
5         41
6         11
8          5
9          3
7          2
10         1
11         1
16         1
17         1
27         1
50         1
Name: n_killed, dtype: int64

There are 239 677 incidents of gun violence in the United States, in 185 835 of the incidence reports no deaths were recorded. 48436 incidence reports recorded 1 death, 4604 incidence reports recorded 2 deaths, 595 incidence reports recorded 3 deaths and 139 incidence reports recorded 4 deaths. Incidence reports which resulted in the death of morethan 10 people are rare. Deaths of 10,11,16,17,27 and 50 people per incident reported have been recorded once.

In [20]:
plt.figure(figsize=(10,7))
sns.distplot(df2,x=df2.n_killed,kde=False)
plt.title("Distribution of deaths from gun violence")
plt.xlabel("Number of deaths")
plt.ylabel("Frequency")
C:\New folder\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
Out[20]:
Text(0, 0.5, 'Frequency')

5. Number of people injured

In [21]:
df2['n_injured'].isna().sum()
Out[21]:
0
In [22]:
df2['n_injured'].dtype
Out[22]:
dtype('int64')
In [23]:
df2['n_injured'].describe()
Out[23]:
count    239677.000000
mean          0.494007
std           0.729952
min           0.000000
25%           0.000000
50%           0.000000
75%           1.000000
max          53.000000
Name: n_injured, dtype: float64
In [24]:
df2['n_injured'].value_counts()
Out[24]:
0     142487
1      81986
2      11484
3       2513
4        760
5        241
6         91
7         51
8         19
9         12
10         6
12         5
11         4
14         3
19         3
13         2
15         2
16         2
17         2
18         1
20         1
25         1
53         1
Name: n_injured, dtype: int64

Of the 239677 reported incidents, 142487 resulted in no injuries and 81986 resulted in one injured person per incident. There are other extreme incidents as shown above that resulted in more than 20 people injured in each incident. Their occurance is infrequent, for instance, 1 incident resulted in 53 injuries and another resulted in 25 injuries.

In [25]:
plt.figure(figsize=(10,7))
sns.distplot(df2,x=df2.n_injured,kde=False)
plt.title("Distribution of injuries from gun violence")
plt.xlabel("Number of injuries")
plt.ylabel("Frequency")
Out[25]:
Text(0, 0.5, 'Frequency')

6. Gun status (Stolen, Not-stolen and unkown)

In [26]:
def StringToDic(S1):
    """Function to create a dictionary from columns"""
    dic1 = {}
    list1 = str(S1).split('||')
    for i in list1:
        try:
            index = i.split('::')[0]
            value = i.split('::')[1]
            dic1[index] = value
        except:
            pass
        
    return dic1
In [27]:
def CountDfValue(df,col='gun_type_dic'):
    """Function to count instances of variables"""
    newDic = {}
    for index,row in df.iterrows():
        for key,value in row[col].items():
            if value not in newDic:
                newDic[value] = 1
            else:
                newDic[value] += 1
                
    return newDic
In [28]:
df2['gun_stolen_dic'] = df2['gun_stolen'].apply(lambda x: StringToDic(x))
df2['gun_stolen_dic'].sample(10)
Out[28]:
18937                   {}
150167    {'0': 'Unknown'}
177241     {'0': 'Stolen'}
67459      {'0': 'Stolen'}
103690    {'0': 'Unknown'}
73500                   {}
107160                  {}
55234                   {}
24692                   {}
50327                   {}
Name: gun_stolen_dic, dtype: object
In [29]:
dicGunstolen = CountDfValue(df2,'gun_stolen_dic')

dicGunstolen
Out[29]:
{'Unknown': 172525, 'Not-stolen': 1804, 'Stolen': 17610}
In [30]:
# Creating a donut chart
dff = pd.DataFrame([['Unknown',172525],['Not_stolen',1804],['Stolen',17610]],columns=['Gun_status','Number_of_guns'])
dff
Out[30]:
Gun_status Number_of_guns
0 Unknown 172525
1 Not_stolen 1804
2 Stolen 17610
In [31]:
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0,0)
plt.style.use('ggplot')
plt.title("The status of guns involved in crime in the US(Stolen/Not-stolen):2013-2018")
plt.pie(x=dff['Number_of_guns'],explode=explode,labels=dff['Gun_status'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()

The status of the majority of guns involved in crime is unknown, that is 90% to be exact. 9% of the guns involved in crime are reported stolen and only 1% of the guns are not stolen.

7. Type of gun involved

In [32]:
df2['gun_type_dic'] = df2['gun_type'].apply(lambda x: StringToDic(x))
df2['gun_type_dic'].sample(10)
Out[32]:
188252                    {'0': 'Unknown'}
179606                    {'0': 'Unknown'}
91923                                   {}
152623                    {'0': 'Unknown'}
19451                                   {}
127391                      {'0': '22 LR'}
191269    {'0': 'Unknown', '1': 'Unknown'}
224414                    {'0': 'Handgun'}
34611                                   {}
127993                    {'0': '45 Auto'}
Name: gun_type_dic, dtype: object
In [33]:
dicGuntype = CountDfValue(df2,'gun_type_dic')
del dicGuntype['Unknown']
dicGuntype
Out[33]:
{'Handgun': 25038,
 '22 LR': 3346,
 '223 Rem [AR-15]': 1613,
 'Shotgun': 4263,
 '9mm': 6448,
 '45 Auto': 2360,
 '12 gauge': 1112,
 '7.62 [AK-47]': 939,
 '40 SW': 2745,
 '44 Mag': 197,
 'Other': 1060,
 '38 Spl': 1809,
 '380 Auto': 2392,
 '410 gauge': 97,
 '32 Auto': 488,
 '308 Win': 92,
 'Rifle': 5268,
 '357 Mag': 822,
 '16 gauge': 32,
 '30-30 Win': 110,
 '25 Auto': 610,
 '10mm': 50,
 '20 gauge': 205,
 '30-06 Spr': 84,
 '300 Win': 23,
 '28 gauge': 6}
In [34]:
index = list(range(0,26))
new = pd.DataFrame.from_dict([dicGuntype])
gun_type_df = pd.DataFrame.transpose(new)
In [35]:
gun_type_df.reset_index(level=0,inplace=True)
gun_type_df.rename(columns={'index':'gun_type',0:'incidents'},inplace=True)
In [36]:
plt.figure(figsize=(12,5))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(x='gun_type',y='incidents',data=gun_type_df,color='blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Types of guns used to commit crime in the US:2013-2018")
plt.ylabel("Incidence of violence")
Out[36]:
Text(0, 0.5, 'Incidence of violence')

The handgun is the most commonly used type of gun to commit crime. Other common types of guns recorded in the data are: 9mm, 223 Rem, Rifle,shortgun and 22LR. The used type of gun used in the USA is the 28 gauge.

8. Lattitude and Longitude

In [37]:
# Missing values in the lattitude and longitude column
df2['latitude'].isnull().sum(),df2['longitude'].isnull().sum()
Out[37]:
(7923, 7923)

Missing values will be removed when performing multivariate analysis

9. Participant age

In [38]:
df2['participant_age']
Out[38]:
0                                     0::20
1                                     0::20
2         0::25||1::31||2::33||3::34||4::33
3                0::29||1::33||2::56||3::33
4                0::18||1::46||2::14||3::47
                        ...                
239672                                0::25
239673                                1::21
239674                                0::21
239675                                0::42
239676                         0::58||1::62
Name: participant_age, Length: 239677, dtype: object
In [39]:
# Converting the columns in to a dictionary
df2['participant_age_dic'] = df2['participant_age'].apply(lambda x: StringToDic(x))
df2['participant_age_dic']
Out[39]:
0                                               {'0': '20'}
1                                               {'0': '20'}
2         {'0': '25', '1': '31', '2': '33', '3': '34', '...
3              {'0': '29', '1': '33', '2': '56', '3': '33'}
4              {'0': '18', '1': '46', '2': '14', '3': '47'}
                                ...                        
239672                                          {'0': '25'}
239673                                          {'1': '21'}
239674                                          {'0': '21'}
239675                                          {'0': '42'}
239676                               {'0': '58', '1': '62'}
Name: participant_age_dic, Length: 239677, dtype: object
In [40]:
dicparticipant_age = CountDfValue(df2,'participant_age_dic')
In [41]:
del dicparticipant_age['209']
del dicparticipant_age['311']
In [42]:
new2 = pd.DataFrame.from_dict([dicparticipant_age])
new2 = pd.DataFrame.transpose(new2)
In [43]:
new2.reset_index(level=0,inplace=True)
new2.head()
Out[43]:
index 0
0 20 10741
1 25 8933
2 31 5291
3 33 4739
4 34 4573
In [44]:
new2.rename(columns = {'index':'participant_age',0:'Number_of_participants'},inplace=True)
new2.head()
Out[44]:
participant_age Number_of_participants
0 20 10741
1 25 8933
2 31 5291
3 33 4739
4 34 4573
In [45]:
new2['participant_age'] = pd.to_numeric(new2['participant_age'],errors='coerce')
In [46]:
new2['Number_of_participants'] = pd.to_numeric(new2['Number_of_participants'],errors='coerce')
new2['Number_of_participants'].dtype,new2['participant_age'].dtype
Out[46]:
(dtype('int64'), dtype('int64'))
In [47]:
new2.shape
Out[47]:
(102, 2)
In [48]:
new2.head()
Out[48]:
participant_age Number_of_participants
0 20 10741
1 25 8933
2 31 5291
3 33 4739
4 34 4573
In [49]:
import plotly
from plotly.offline import init_notebook_mode, iplot
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly import tools
In [50]:
trace1 = go.Bar(
    x=new2.participant_age,
    y=new2.Number_of_participants,
    name='Age distribution of participants',
    marker=dict(
        color='rgb(55, 83, 109)'))
In [51]:
data = [trace1]
layout = go.Layout(
    title='Age Distribution of Participants',
    xaxis=dict(
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)',
        ),
        range=[0,100]
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The figure above shows a histogram of the age of participants in gun violence. This includes both suspects and victims. The age group with the highest number of people involved in gun related violence is 16-40 years. The distribution of the participants' age is skewed to the right, the tail on the right is longer.

10. Gender

In [52]:
df2['participant_gender']
Out[52]:
0                0::Male||1::Male||3::Male||4::Female
1                                             0::Male
2         0::Male||1::Male||2::Male||3::Male||4::Male
3                0::Female||1::Male||2::Male||3::Male
4              0::Female||1::Male||2::Male||3::Female
                             ...                     
239672                                      0::Female
239673                               0::Male||1::Male
239674                                        0::Male
239675                                        0::Male
239676                             0::Female||1::Male
Name: participant_gender, Length: 239677, dtype: object

From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.

In [53]:
# Converting the columns in to a dictionary
df2['participant_gender_dic'] = df2['participant_gender'].apply(lambda x: StringToDic(x))
df2['participant_gender_dic']
Out[53]:
0         {'0': 'Male', '1': 'Male', '3': 'Male', '4': '...
1                                             {'0': 'Male'}
2         {'0': 'Male', '1': 'Male', '2': 'Male', '3': '...
3         {'0': 'Female', '1': 'Male', '2': 'Male', '3':...
4         {'0': 'Female', '1': 'Male', '2': 'Male', '3':...
                                ...                        
239672                                      {'0': 'Female'}
239673                           {'0': 'Male', '1': 'Male'}
239674                                        {'0': 'Male'}
239675                                        {'0': 'Male'}
239676                         {'0': 'Female', '1': 'Male'}
Name: participant_gender_dic, Length: 239677, dtype: object
In [54]:
dicGender = CountDfValue(df2,'participant_gender_dic')
del dicGender['Male, female']
dicGender
Out[54]:
{'Male': 304102, 'Female': 42376}
In [55]:
dff = pd.DataFrame({'Gender': ['Male','Female'],'Participants': [304102,42376]})
dff
Out[55]:
Gender Participants
0 Male 304102
1 Female 42376
In [56]:
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of participants by gender involved in gun-related violents: 2013-2018")
plt.pie(x=dff['Participants'],explode=explode,labels=dff['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()

From the pie chart above, of the total number of participants (suspects and victims), 88% are men and 12% are women.

11. Participant Status

In [57]:
# Checking the format of the column
df2['participant_status'].head()
Out[57]:
0    0::Arrested||1::Injured||2::Injured||3::Injure...
1        0::Killed||1::Injured||2::Injured||3::Injured
2    0::Injured, Unharmed, Arrested||1::Unharmed, A...
3           0::Killed||1::Killed||2::Killed||3::Killed
4         0::Injured||1::Injured||2::Killed||3::Killed
Name: participant_status, dtype: object

From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.

In [58]:
df2['participant_status_dic'] = df2['participant_status'].apply(lambda x: StringToDic(x))
df2['participant_status_dic'].head(3)
Out[58]:
0    {'0': 'Arrested', '1': 'Injured', '2': 'Injure...
1    {'0': 'Killed', '1': 'Injured', '2': 'Injured'...
2    {'0': 'Injured, Unharmed, Arrested', '1': 'Unh...
Name: participant_status_dic, dtype: object
In [59]:
dicStatus = CountDfValue(df2,'participant_status_dic')
dicStatus
Out[59]:
{'Arrested': 10169,
 'Injured': 113332,
 'Killed': 59386,
 'Injured, Unharmed, Arrested': 22,
 'Unharmed, Arrested': 85388,
 'Unharmed': 100822,
 'Injured, Arrested': 3467,
 'Killed, Unharmed, Arrested': 14,
 'Injured, Unharmed': 31,
 'Killed, Injured': 10,
 'Killed, Unharmed': 21,
 'Killed, Arrested': 51}

As seen from the dictionary above, there are some entries which do not make sense. For example, killed and unharmed, killed and arrested, killed and injured, injured and unharmed, and killed,unharmed and arrested. These entries are removed from the dictionary.

In [60]:
del dicStatus['Killed, Unharmed, Arrested']
del dicStatus['Injured, Unharmed']
del dicStatus['Killed, Injured']
del dicStatus['Killed, Unharmed']
del dicStatus['Killed, Arrested']
del dicStatus['Injured, Unharmed, Arrested']
dicStatus
Out[60]:
{'Arrested': 10169,
 'Injured': 113332,
 'Killed': 59386,
 'Unharmed, Arrested': 85388,
 'Unharmed': 100822,
 'Injured, Arrested': 3467}
In [61]:
new3 = pd.DataFrame.from_dict([dicStatus])
new3.head()
Out[61]:
Arrested Injured Killed Unharmed, Arrested Unharmed Injured, Arrested
0 10169 113332 59386 85388 100822 3467
In [62]:
new3 = pd.DataFrame.transpose(new3)
In [63]:
new3 = new3.reset_index(level=0)
new3
Out[63]:
index 0
0 Arrested 10169
1 Injured 113332
2 Killed 59386
3 Unharmed, Arrested 85388
4 Unharmed 100822
5 Injured, Arrested 3467
In [64]:
new3.rename(columns={'index':'Status',0:'Participants'},inplace= True)
new3.head()
Out[64]:
Status Participants
0 Arrested 10169
1 Injured 113332
2 Killed 59386
3 Unharmed, Arrested 85388
4 Unharmed 100822
In [65]:
plt.figure(figsize=(12,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(x='Status',y='Participants',data=new3,color='blue')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Participants' status")
plt.ylabel("Number of participants")
Out[65]:
Text(0, 0.5, 'Number of participants')

11. Participant type

In [66]:
df2['participant_type'].head()
Out[66]:
0    0::Victim||1::Victim||2::Victim||3::Victim||4:...
1    0::Victim||1::Victim||2::Victim||3::Victim||4:...
2    0::Subject-Suspect||1::Subject-Suspect||2::Vic...
3    0::Victim||1::Victim||2::Victim||3::Subject-Su...
4    0::Victim||1::Victim||2::Victim||3::Subject-Su...
Name: participant_type, dtype: object

From observation, the format of this columns is the same as some of the columns above. We are going to use functions, StringtoDic and CountDfValue, to count the number of males and females invovled in gun-related crime. The figures of each gender includes both victims and suspects.

In [67]:
df2['participant_type_dic'] = df2['participant_type'].apply(lambda x: StringToDic(x))
df2['participant_type_dic'].head()
Out[67]:
0    {'0': 'Victim', '1': 'Victim', '2': 'Victim', ...
1    {'0': 'Victim', '1': 'Victim', '2': 'Victim', ...
2    {'0': 'Subject-Suspect', '1': 'Subject-Suspect...
3    {'0': 'Victim', '1': 'Victim', '2': 'Victim', ...
4    {'0': 'Victim', '1': 'Victim', '2': 'Victim', ...
Name: participant_type_dic, dtype: object
In [68]:
participanttype_dic = CountDfValue(df2,'participant_type_dic')
In [69]:
participanttype_dic
Out[69]:
{'Victim': 189600, 'Subject-Suspect': 195913}
In [70]:
dff2 = pd.DataFrame({'Status':['Victim','Suspect'],'Frequency':[189600,195913]})
dff2
Out[70]:
Status Frequency
0 Victim 189600
1 Suspect 195913
In [71]:
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of victims and suspects involved in gun-related violence: 2013-2018")
plt.pie(x=dff2['Frequency'],explode=explode,labels=dff2['Status'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()

The pie chart above shows the number of victims and participants is balanced. 49% of the participants are victims and 51% are suspects.

3.2 Multivariate Analysis

1. Time Series Analysis

In [72]:
df3 = df2.copy()
In [73]:
df3.date.dtype
Out[73]:
dtype('O')
In [74]:
df3['date'] = pd.to_datetime(df3.date)
df3.date.dtype
Out[74]:
dtype('<M8[ns]')
In [75]:
df3 = df3.assign(year = df3['date'].map(lambda dates: dates.year))
df3 = df3. assign(month = df3['date'].map(lambda dates:dates.month))
df3 = df3.assign(day = df3['date'].map(lambda dates: dates.weekday()))
df3.sample(3)
Out[75]:
incident_id date state city_or_county n_killed n_injured gun_stolen gun_type latitude longitude ... participant_type gun_stolen_dic gun_type_dic participant_age_dic participant_gender_dic participant_status_dic participant_type_dic year month day
143547 642432 2016-08-29 Connecticut Hartford 0 1 0::Unknown 0::Unknown 41.7809 -72.6872 ... 0::Victim {'0': 'Unknown'} {'0': 'Unknown'} {'0': '27'} {'0': 'Male'} {'0': 'Injured'} {'0': 'Victim'} 2016 8 0
217139 982223 2017-11-05 Washington Darrington 0 0 0::Unknown 0::22 LR 48.2778 -121.5710 ... 0::Subject-Suspect {'0': 'Unknown'} {'0': '22 LR'} {} {'0': 'Male'} {'0': 'Unharmed'} {'0': 'Subject-Suspect'} 2017 11 6
122213 544601 2016-04-20 Connecticut Meriden 0 2 NaN NaN 41.5368 -72.8090 ... 0::Victim||1::Victim {} {} {'0': '27', '1': '23'} {'0': 'Male', '1': 'Male'} {'0': 'Injured', '1': 'Injured'} {'0': 'Victim', '1': 'Victim'} 2016 4 2

3 rows × 23 columns

1.1 Number of people killed per year: 2013-2018
In [76]:
y_years = df3.groupby('year')['incident_id'].count().index.values
y_years
Out[76]:
array([2013, 2014, 2015, 2016, 2017, 2018], dtype=int64)
In [77]:
x_killed = df3.groupby('year')['n_killed'].sum()
n_killeddf = pd.DataFrame({'year':y_years,'Number_of_people':x_killed})
In [78]:
n_killeddf
Out[78]:
year Number_of_people
year
2013 2013 317
2014 2014 12557
2015 2015 13484
2016 2016 15066
2017 2017 15511
2018 2018 3533
In [79]:
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='year',data=n_killeddf,color='#1f77b4')
plt.title("Number of people killed inthe US per year because of gun violence:2013-2018")
plt.ylabel("Number of people")
Out[79]:
Text(0, 0.5, 'Number of people')

A total of 278 incidents are recorded in the data for the period between January 2013 and December 2013, these 278 incidents resulted in 317 deaths. It is odd that only 278 incidents were reported in 2013 considering the subsequent years incidents recorded are more than 50000.The number of people dying because of gun violence increased by 24% from 2014 to 2017. Please note the data recorded for 2018 is from January to March of 2018. This justifies the significant decrease in deaths shown on the figure above.

1.2 Number of people injured per year
In [80]:
x_injured = df3.groupby('year')['n_injured'].sum()
n_injureddf = pd.DataFrame({'year':y_years,'Number_of_people':x_injured})
n_injureddf
Out[80]:
year Number_of_people
year
2013 2013 979
2014 2014 23002
2015 2015 26967
2016 2016 30580
2017 2017 30703
2018 2018 6171
In [81]:
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='year',data=n_injureddf,color='#1f77b4')
plt.title("Number of people injured inthe US per year because of gun violence:2013-2018")
plt.ylabel("Number of people")
Out[81]:
Text(0, 0.5, 'Number of people')

The 278 incidents reported in 2013 resulted in 979 injuries. The number of people injured increased by 34% from 2014 to 2017. The injuries display a similar trend as the number of people killed. There is a significant decrease in the number of injuries in 2018 because incidents recorded are only from January to March 2018.

1.3 Number of people killed per month
In [82]:
y_months = df3.groupby('month')['incident_id'].count().index.values
y_months
Out[82]:
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=int64)
In [83]:
months = np.array(['x','January','February','March','April','May','June','July','August','September','October','November','December'])
months[y_months]
Out[83]:
array(['January', 'February', 'March', 'April', 'May', 'June', 'July',
       'August', 'September', 'October', 'November', 'December'],
      dtype='<U9')
In [84]:
month_killed = df3.groupby('month')['n_killed'].sum()
In [85]:
df_month_killed = pd.DataFrame({'month':months[y_months],'Number_of_people':month_killed},index=[1,2,3,4,5,6,7,8,9,10,11,12])
df_month_killed
Out[85]:
month Number_of_people
1 January 6035
2 February 4945
3 March 5641
4 April 4383
5 May 4830
6 June 4886
7 July 5276
8 August 5127
9 September 4779
10 October 4791
11 November 4848
12 December 4927
In [86]:
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='month',data=df_month_killed,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of people killed inthe US per month because of gun violence:2013-2018")
plt.ylabel("Number of people")
Out[86]:
Text(0, 0.5, 'Number of people')

The figure above shows the number of people dying each month is more or less uniform. However, January reorded the highest number of deaths followed by the month of March.

1.4 Number of incidents per year

In [87]:
x_inc = df3.groupby('year')['incident_id'].agg('count')
x_inc
Out[87]:
year
2013      278
2014    51854
2015    53579
2016    58763
2017    61401
2018    13802
Name: incident_id, dtype: int64
In [88]:
inc_df_year = pd.DataFrame({'year':y_years,'Incidents':x_inc})
inc_df_year
Out[88]:
year Incidents
year
2013 2013 278
2014 2014 51854
2015 2015 53579
2016 2016 58763
2017 2017 61401
2018 2018 13802
In [89]:
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='year',data=inc_df_year,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents per year inthe US:2013-2018")
plt.ylabel("Number of Incidents")
Out[89]:
Text(0, 0.5, 'Number of Incidents')

2013 recorded the least incidents of gun related violence. The number of incidents reported increased by 18% from 2014 to 2017. There is a significant decrease in incidents recorded in 2018 because the data available is only for 3 months (i.e. from January to March). The incidents reported follow a similar trend as the number of people killed and the number of people injured.

1.5 Total number of incidents each month
In [90]:
inc_month = df3.groupby('month')['incident_id'].agg('count')
In [91]:
inc_month_df = pd.DataFrame({'month':months[y_months],'Incidents':inc_month})
In [92]:
plt.figure(figsize=(10,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='month',data=inc_month_df,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents each month inthe US:2013-2018")
plt.ylabel("Number of Incidents")
Out[92]:
Text(0, 0.5, 'Number of Incidents')

January recorded the highest number of incidents of gun related violence followed by March. The number of incidents recorded each month is more or less uniform.

1.6 Number of incidents each day
In [93]:
index = df3.groupby('day')['incident_id'].count().index.values
In [94]:
Incidents = df3.groupby('day')['incident_id'].agg('count')
In [95]:
days = np.array(['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
In [96]:
df1_day = pd.DataFrame({'day':days[index],'Incidents':Incidents},index=[0,1,2,3,4,5,6])
df1_day
Out[96]:
day Incidents
0 Monday 33760
1 Tuesday 33307
2 Wednesday 34126
3 Thursday 32561
4 Friday 32775
5 Saturday 36096
6 Sunday 37052
In [97]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Incidents',x='day',data=df1_day,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence incidents each day inthe US:2013-2018")
plt.ylabel("Number of Incidents")
Out[97]:
Text(0, 0.5, 'Number of Incidents')

The figure above shows the number of incidents of gun violence recorded for each day of the week. Saturday and Sunday recorded the highest number of incidents and the least number of incidents was reported on Thursday.Incidents reported on Saturday and Sunday are 11% and 14% higher than incidents reported on Thursday.

1.7 Total number of deaths per day of the week
In [98]:
nkilled_d = df3.groupby('day')['n_killed'].sum()
In [99]:
dayk_df = pd.DataFrame({'day':days[index],'Number_of_deaths':nkilled_d},index=[0,1,2,3,4,5,6])
dayk_df
Out[99]:
day Number_of_deaths
0 Monday 8383
1 Tuesday 7917
2 Wednesday 8083
3 Thursday 7732
4 Friday 8353
5 Saturday 9813
6 Sunday 10187
In [100]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_deaths',x='day',data=dayk_df,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence deaths each day inthe US:2013-2018")
plt.ylabel("Number of deaths")
Out[100]:
Text(0, 0.5, 'Number of deaths')

The figure above shows the number of deaths caused by gun related violence for each day of the week. Saturday and Sunday recorded the highest number of deaths and the least number of deaths was recorded on Thursday. The number of deaths recorded on Thursday are 27% higher than deaths recorded on Thursday. The number of deaths recorded on Sunday are 32% more than the number of deaths recorded on Thursday.

1.8 Total number of injuries for each week day
In [101]:
injured_day = df3.groupby('day')['n_injured'].sum()
In [102]:
injured_ddf = pd.DataFrame({'day':days[index],'Number_of_people':injured_day},index = [0,1,2,3,4,5,6])
injured_ddf
Out[102]:
day Number_of_people
0 Monday 16164
1 Tuesday 15168
2 Wednesday 15193
3 Thursday 14481
4 Friday 15275
5 Saturday 20559
6 Sunday 21562
In [103]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_people',x='day',data=injured_ddf,color='#1f77b4')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Total number of gun violence injuries each day inthe US:2013-2018")
plt.ylabel("Number of injuries")
Out[103]:
Text(0, 0.5, 'Number of injuries')

The highest number of injuries caused by gun related violence was recorded on Sunday followed by Saturday. The least number of injuries was recorded on Thursday. The number of injuries recorded on Sunday is 49% more than injuries recorded on Thursday. The number of injuries recorded on Saturday is 42% more than injuries reported on a Thursday.

2. Geographical Analysis

2.1 Reported Incidents by state
In [104]:
state_incident = df3.groupby('state')['state'].agg('count')
In [105]:
states = df3.groupby('state')['state'].count().index.values
In [106]:
data = {'State': states,'Number_of_incidents':state_incident}
df_state = pd.DataFrame(data)
In [107]:
top10_incidents = df_state.loc[:,['State','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=False).head(10)
In [110]:
top10_incidents = top10_incidents.reset_index(drop=True)
top10_incidents
Out[110]:
State Number_of_incidents
0 Illinois 17556
1 California 16306
2 Florida 15029
3 Texas 13577
4 Ohio 10244
5 New York 9712
6 Pennsylvania 8929
7 Georgia 8925
8 North Carolina 8739
9 Louisiana 8103
In [112]:
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='State',data=top10_incidents,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA States:2013-2018")
plt.ylabel("Number of incidents")
Out[112]:
Text(0, 0.5, 'Number of incidents')
In [111]:
bottom10_incidents = df_state.loc[:,['State','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=True).head(10)
bottom10_incidents = bottom10_incidents.reset_index(drop=True)
bottom10_incidents
Out[111]:
State Number_of_incidents
0 Hawaii 289
1 Vermont 472
2 Wyoming 494
3 South Dakota 544
4 North Dakota 573
5 Montana 638
6 Idaho 661
7 Rhode Island 895
8 Maine 907
9 New Hampshire 964
In [113]:
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='State',data=bottom10_incidents,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA States:2013-2018")
plt.ylabel("Number of incidents")
Out[113]:
Text(0, 0.5, 'Number of incidents')
2.2 Reported incidents by city
In [167]:
cities = df3.groupby('city_or_county')['city_or_county'].count().index.values
In [168]:
Number_of_incidents = df3.groupby('city_or_county')['city_or_county'].agg('count')
data = {'city': cities,'Number_of_incidents':Number_of_incidents}
In [169]:
df_city = pd.DataFrame(data)
In [170]:
top10_city = df_city.loc[:,['city','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=False).head(10)
In [171]:
top10_city = top10_city.reset_index(drop=True)
top10_city
Out[171]:
city Number_of_incidents
0 Chicago 10814
1 Baltimore 3943
2 Washington 3279
3 New Orleans 3071
4 Philadelphia 2963
5 Houston 2501
6 Saint Louis 2501
7 Milwaukee 2487
8 Jacksonville 2448
9 Memphis 2386
In [120]:
plt.figure(figsize=(8,4))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='city',data=top10_city,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA Cities or Counties:2013-2018")
plt.ylabel("Number of incidents")
Out[120]:
Text(0, 0.5, 'Number of incidents')
In [172]:
bottom10_city = df_city.loc[:,['city','Number_of_incidents']].sort_values(by='Number_of_incidents',ascending=True).head(10)
In [173]:
bottom10_city = bottom10_city.reset_index(drop=True)
bottom10_city
Out[173]:
city Number_of_incidents
0 jefferson parish (county) 1
1 Iberia (county) 1
2 Iberia 1
3 Hypoluxo 1
4 Hyndman 1
5 Hyden 1
6 St Petersburg 1
7 Hydaburg 1
8 Hyattsville (Tuxedo) 1
9 Hyattsville (Palmer Park) 1
In [122]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Number_of_incidents',x='city',data=bottom10_city,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA Cities or Counties:2013-2018")
plt.ylabel("Number of incidents")
Out[122]:
Text(0, 0.5, 'Number of incidents')
2.3 Reported Number of Deaths by State
In [123]:
state = df3.groupby('state')['n_killed'].count().index.values
Number_of_deaths = df3.groupby('state')['n_killed'].agg('count')
data = {'State':state,'Count':Number_of_deaths}
df_death = pd.DataFrame(data)
In [154]:
top10 = df_death.loc[:,['State','Count']].sort_values(by='Count',ascending=False).head(10)
In [174]:
top10 = top10.reset_index(drop=True)
top10
Out[174]:
State Count
0 Illinois 17556
1 California 16306
2 Florida 15029
3 Texas 13577
4 Ohio 10244
5 New York 9712
6 Pennsylvania 8929
7 Georgia 8925
8 North Carolina 8739
9 Louisiana 8103
In [161]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='State',data=top10,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA States with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
Out[161]:
Text(0, 0.5, 'Number of deaths')
In [160]:
bottom10 = df_death.loc[:,['State','Count']].sort_values(by='Count',ascending=True).head(10)
In [175]:
bottom10 = bottom10.reset_index(drop=True)
bottom10
Out[175]:
State Count
0 Hawaii 289
1 Vermont 472
2 Wyoming 494
3 South Dakota 544
4 North Dakota 573
5 Montana 638
6 Idaho 661
7 Rhode Island 895
8 Maine 907
9 New Hampshire 964
In [162]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='State',data=bottom10,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA States with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
Out[162]:
Text(0, 0.5, 'Number of deaths')
2.4 Reported Number of Deaths by City or County
In [177]:
city = df3.groupby('city_or_county')['n_killed'].count().index.values
Number_of_deaths = df3.groupby('city_or_county')['n_killed'].agg('count')
data = {'city':city,'Count':Number_of_deaths}
df_city = pd.DataFrame(data)
In [178]:
# Top 10 cities or counties with highest number of deaths
top10_df = df_city.loc[:,['city','Count']].sort_values(by='Count',ascending=False).head(10)
top10_df = top10_df.reset_index(drop=True)
top10_df
Out[178]:
city Count
0 Chicago 10814
1 Baltimore 3943
2 Washington 3279
3 New Orleans 3071
4 Philadelphia 2963
5 Houston 2501
6 Saint Louis 2501
7 Milwaukee 2487
8 Jacksonville 2448
9 Memphis 2386
In [153]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='city',data=top10_df,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Top 10 USA Cities with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
Out[153]:
Text(0, 0.5, 'Number of deaths')
In [158]:
bottom10_df = df_city.loc[:,['city','Count']].sort_values(by='Count',ascending=True).head(10)
In [179]:
bottom10_df = bottom10_df.reset_index(drop=True)
bottom10_df
Out[179]:
city Count
0 jefferson parish (county) 1
1 Iberia (county) 1
2 Iberia 1
3 Hypoluxo 1
4 Hyndman 1
5 Hyden 1
6 St Petersburg 1
7 Hydaburg 1
8 Hyattsville (Tuxedo) 1
9 Hyattsville (Palmer Park) 1
In [159]:
plt.figure(figsize=(8,3))
sns.set_theme(style="whitegrid",font_scale=1.2)
sns.set_style("whitegrid", {'axes.grid' : False})
ax =sns.barplot(y='Count',x='city',data=bottom10_df,palette='crest')
ax.set_xticklabels(ax.get_xticklabels(),rotation = 60)
plt.title("Bottom 10 USA Cities with the Highest Number of Deaths Resulting from Gun Violence:2013-2018")
plt.ylabel("Number of deaths")
Out[159]:
Text(0, 0.5, 'Number of deaths')

3. Characteristics of the participants

In [129]:
mappingCol1 = 'participant_type_dic'

def MapRows(df,mappingCol1,mappingCol2):
    newDic = {'Victim':[],'Suspect':[]}
    for rowName, row in df.iterrows():
        for keys,values in row[mappingCol1].items():
            if(keys in row[mappingCol2]) and (values == 'Victim'):
                newDic['Victim'].append(row[mappingCol2][keys])
            elif(keys in row[mappingCol2]) and ('Suspect' in values):
                newDic['Suspect'].append(row[mappingCol2][keys])
    
    return newDic
In [130]:
mappingCol2 = 'participant_age_dic'
mappingCol3 = 'participant_gender_dic'

Map_type_age = MapRows(df3,mappingCol1,mappingCol2)
In [131]:
Map_type_gender = MapRows(df3,mappingCol1,mappingCol3)
In [132]:
def countDic(L):
    dic={}
    for i in L:
        if i not in dic:
            dic[i] = 1
        else:
            dic[i] += 1
    return dic
3.1 Victim Age Distribution
In [133]:
vic_age_list = list(countDic(Map_type_age['Victim']).keys())
vic_age_count = list(countDic(Map_type_age['Victim']).values())
In [134]:
data = pd.DataFrame({'Age':vic_age_list,'Count':vic_age_count})
data['Age'] = pd.to_numeric(data['Age'],errors='coerce')
data['Age'].dtype
Out[134]:
dtype('int64')
In [135]:
data = data.loc[:,['Age','Count']].sort_values(by='Age',ascending=True)
In [136]:
trace1 = go.Bar(
        x = data.Age,
        y = data.Count,
        name = 'Victims Age Distribution',
        marker = dict(
                color = 'rgb(55,83,109)'))
data = [trace1]
layout = go.Layout(
    title="Victims' Age Distribution",
    xaxis=dict(
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)',
        ),
        range=[0,100]
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The Victims' ages distribution is skewed to the right. The distribution's left tail is long. The most affected age groups are from 14 to 45 years, with at least 1000 victims recorded for each age group. The highest number of victims were aged 19 years. As observed from the histogram there are also victims below 14 years. There are 89 victims who are less 1 year old recorded in the data. 210 victims who are 1 year old are also recorded in the data.

3.2 Suspects' Age Distribution
In [137]:
sus_age_list = list(countDic(Map_type_age['Suspect']).keys())
sus_age_count = list(countDic(Map_type_age['Suspect']).values())
In [150]:
data2 = pd.DataFrame({'Age':sus_age_list,'Count':sus_age_count})
data2['Age'] = pd.to_numeric(data2['Age'],errors='coerce')
data2 = data2.sort_values(by='Age',ascending=True)
In [139]:
trace1 = go.Bar(
        x = data2.Age,
        y = data2.Count,
        name = 'Suspects Age Distribution',
        marker = dict(
                color = 'rgb(55,83,109)'))
data = [trace1]
layout = go.Layout(
    title="Suspects' Age Distribution",
    xaxis=dict(
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)',
        ),
        range=[0,100]
    ),
    yaxis=dict(
        title='Count',
        titlefont=dict(
            size=16,
            color='rgb(107, 107, 107)'
        ),
        tickfont=dict(
            size=14,
            color='rgb(107, 107, 107)'
        )
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15,
    bargroupgap=0.1
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

The suspects' ages distribution is skewed to the right. The distribution's left tail is long. The majority of suspects fall under age groups 15 to 45 years. The highest number of suspects are aged 18 and 19 years old. There are also records of suspects who are less than 4 years old. These records are errors perhaps. There are also 36 suspects aged 5, 41 suspects aged 6, 31 suspects aged 7, 37 suspects aged 8, 38 suspects aged 9 and 48 suspects aged 10. The data shows records of minors having access to guns.

3.3 Victims' Gender Distribution
In [140]:
vic_gender_list = list(countDic(Map_type_gender['Victim']).keys())
vic_gender_count = list(countDic(Map_type_gender['Victim']).values())
victims_gender = pd.DataFrame({'Gender':vic_gender_list,'Count':vic_gender_count})
In [141]:
victims_gender[victims_gender['Gender']=='Male, female'] = np.nan
In [142]:
victims_gender.dropna(inplace=True)
victims_gender
Out[142]:
Gender Count
0 Male 136394.0
1 Female 30630.0
In [143]:
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of Victims by gender involved in gun-related violents: 2013-2018")
plt.pie(x=victims_gender['Count'],explode=explode,labels=victims_gender['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()

82% of the victims are male and 18% of the victims are female

3.4 Suspects' Gender Distribution
In [144]:
sus_gender_list = list(countDic(Map_type_gender['Suspect']).keys())
sus_gender_count = list(countDic(Map_type_gender['Suspect']).values())
In [145]:
suspect_gender = pd.DataFrame({'Gender':sus_gender_list,'Count':sus_gender_count})
suspect_gender
Out[145]:
Gender Count
0 Female 11746
1 Male 167708
In [151]:
# Plotting a donut chart
plt.figure(figsize=(8,6))
explode = (0,0)
plt.style.use('ggplot')
plt.title("Number of Suspects by gender involved in gun-related violents: 2013-2018")
plt.pie(x=suspect_gender['Count'],explode=explode,labels=suspect_gender['Gender'],autopct = '%.2f%%',shadow=False,startangle=0)
plt.axis('equal')
plt.legend(loc='upper right')
circle = plt.Circle(xy=(0,0),radius = 0.7,facecolor='white')
plt.gca().add_artist(circle)
plt.show()

93% of the suspects are male and 7% are female.

4. Conclusion

The data used in the analysis contains records of gun violence incidents recorded from January 2013 to March 2018. The incidents compiled in the data are from all states in the United States of America. There 239677 incidents of gun violence recorded. 185 835 of these incidents recorded 0 deaths and the rest of the incidents recorded resulted in at least 1 death. 97 190 of these incidents recorded that at least 1 person was injured. It is concerning to note the status of 90% of the guns involved in gun violence is unknown. 9.1% of the guns are confirmed stolen and 1% of the guns are confirmed not stolen. The handgun is the most commonly used type of weapon to commit crime. Other types of guns also recorded are: 9mm, 223 Rem, shortgun and 22LR, which are among the most commonly used guns in the US.

The data analysed has 278 incidents recorded in 2013. This maybe a result of an error ommission. The number of incidents recorded increased by 18.4% from 2014 to 2017. The number of deaths caused by gun violence increased by 24% from 2014 to 2017. The number of injuries assosciated with gun violence increased by 34% from 2014 to 2017. From the analysis, it was observed gun violence incidents are the lowest on Thursdays and are increased by as much as 14% on Sunday. The highest number of incidents is recorded on Saturdays and Sundays. The highest number of injuries or deaths have also been recorded on Saturday and Sundays.

4.2 What age group and or gender has a higher probability of resorting to gun violence

The majority of suspects fall under age groups between 15 to 45 years. The mode age of the suspects is 19 and the second highest age is 18. There are also records of 36 suspects aged 5, 41 suspects aged 6, 31 suspects aged 7, 37 suspects aged 8, 38 suspects aged 9 and 48 suspects aged 10. The gun control bill recently passed by congress imposes tougher background checks for buyers below the age of 21. This addresses one of the issues since the mode age groups of the suspects is 18 and 19. However, I think access to guns by minors needs to be investigated. 93% of the suspects are male and 7% of the suspects are female.

4.3 Could we explain some geographical pattern of gun violence inthe US and influence of gun control laws

The states which recorded the highest and lowest number of incidents:

In [180]:
top10_incidents,bottom10_incidents
Out[180]:
(            State  Number_of_incidents
 0        Illinois                17556
 1      California                16306
 2         Florida                15029
 3           Texas                13577
 4            Ohio                10244
 5        New York                 9712
 6    Pennsylvania                 8929
 7         Georgia                 8925
 8  North Carolina                 8739
 9       Louisiana                 8103,
            State  Number_of_incidents
 0         Hawaii                  289
 1        Vermont                  472
 2        Wyoming                  494
 3   South Dakota                  544
 4   North Dakota                  573
 5        Montana                  638
 6          Idaho                  661
 7   Rhode Island                  895
 8          Maine                  907
 9  New Hampshire                  964)

Cities which recorded the highest and lowest number of incidents:

In [181]:
top10_city,bottom10_city
Out[181]:
(           city  Number_of_incidents
 0       Chicago                10814
 1     Baltimore                 3943
 2    Washington                 3279
 3   New Orleans                 3071
 4  Philadelphia                 2963
 5       Houston                 2501
 6   Saint Louis                 2501
 7     Milwaukee                 2487
 8  Jacksonville                 2448
 9       Memphis                 2386,
                         city  Number_of_incidents
 0  jefferson parish (county)                    1
 1            Iberia (county)                    1
 2                     Iberia                    1
 3                   Hypoluxo                    1
 4                    Hyndman                    1
 5                      Hyden                    1
 6              St Petersburg                    1
 7                   Hydaburg                    1
 8       Hyattsville (Tuxedo)                    1
 9  Hyattsville (Palmer Park)                    1)

States which recorded the highest and lowest number of deaths:

In [182]:
top10,bottom10
Out[182]:
(            State  Count
 0        Illinois  17556
 1      California  16306
 2         Florida  15029
 3           Texas  13577
 4            Ohio  10244
 5        New York   9712
 6    Pennsylvania   8929
 7         Georgia   8925
 8  North Carolina   8739
 9       Louisiana   8103,
            State  Count
 0         Hawaii    289
 1        Vermont    472
 2        Wyoming    494
 3   South Dakota    544
 4   North Dakota    573
 5        Montana    638
 6          Idaho    661
 7   Rhode Island    895
 8          Maine    907
 9  New Hampshire    964)

Cities which recorded the highest and lowest number of deaths:

In [184]:
top10_df,bottom10_df
Out[184]:
(           city  Count
 0       Chicago  10814
 1     Baltimore   3943
 2    Washington   3279
 3   New Orleans   3071
 4  Philadelphia   2963
 5       Houston   2501
 6   Saint Louis   2501
 7     Milwaukee   2487
 8  Jacksonville   2448
 9       Memphis   2386,
                         city  Count
 0  jefferson parish (county)      1
 1            Iberia (county)      1
 2                     Iberia      1
 3                   Hypoluxo      1
 4                    Hyndman      1
 5                      Hyden      1
 6              St Petersburg      1
 7                   Hydaburg      1
 8       Hyattsville (Tuxedo)      1
 9  Hyattsville (Palmer Park)      1)

Now looking at the top 3 states with the highest incidents of gun violence which are: Illinois, California and Florida. In 2013, Illinois adopted the Firearm Conceal Carry Act allowing individuals to obtain a licence to carry concealed handguns in public. A licence is not required to carry a concealed handgun on a person's property, including his or her home or place of business. Nor is a licence required to carry a concealed handgun on the land or in the home of another person, as long as it is within that person's permission. To purchase a firearm, one should be at least 21 years old, if not parental consent is needed. Additionally, one should be in posession of a Firearm Owner's Identification, but persons with a Conceal Carry licence can purchase a handgun without the Firearm Owner's Identification.

In California, a US citizen or legal resident at least 18 years old may carry a handgun anywhere within his or her place of residence, place of business or private property. A permit or licence is not required. Concealed carry is legal with a California Conceal Carry weapons licence. The minimum age allowed is 18. California state does not require a permit to purchase firearms.

In Florida, people are not required to possess a permit to own or purchase a handgun. They are required to carry a Conceal weapon permit to carry a concealed weapon. But also persons are permitted to carry concealed firearms without a permit under these circumstances:

  • Having a gun in your home or place of business
  • Travelling in private transport with a handgun securely encased
  • Being an independent investigator employed by public defenders of the state
  • Working in a lawful business that deals, manufactures, services or repairs guns.

The minimum age required to purchase, own or carry a concealed weapon is 21.

Now looking at the three states with the lowest gun violence incidents which are: Hawaii, Vermont and Wyoming. Hawaii requires a permit to acquire a handgun, persons acquiring the handgun must be 21 years old and above. A licence is also required to carry a firearm in public. Wyoming and Hawaii have similar gun control laws.

In Vermont open carry is legal and no licence is required to open carry. Vermont does not require a licence to carry a concealed firearm. Vermont does not prohibit possession of machine guns. To purchase a firearm a permit is not a requirement. The legal age limit to purchase, own or carry a gun is 16 years old.

Gun control laws may not influence the number of incidents of gun violence occurring, other factors may be at play. Vermont has the lenient and less strict gun control laws but it is one of the states with the lowest incidents of gun violence. Gun control laws in Illinois are much more stringent than in Vermont.

In [ ]: